102 research outputs found
Interpretability and Generalization of Deep Low-Level Vision Models
The low-level vision task is an important type of task in computer vision, including various image
restoration tasks, such as image super-resolution, image denoising, image deraining, etc. In recent
years, deep learning technology has become the de facto method for solving low-level vision
problems, relying on its excellent performance and ease of use. By training on large amounts of
paired data, it is anticipated that deep low-level vision models can learn rich semantic knowledge and
process images in an intelligent manner for real-world applications. However, because our
understanding of deep learning models and low-level vision tasks is not deep enough, we cannot
explain the success and failure of these deep low-level vision models. Deep learning models are
widely acknowledged as ``black boxes'' due to their complexity and non-linearity. We cannot know
what information the model used when processing the input or whether it learned what we wanted.
When there is a problem with the model, we cannot identify the underlying source of the problem,
such as the generalization problem of the low-level vision model. This research proposes
interpretability analysis of deep low-level vision models to gain a more profound insight into the deep
learning models for low-level vision tasks. I aim to elucidate the mechanisms of the deep learning
approach and to discern insights regarding the successes or shortcomings of these methods. This is
the first study to perform interpretability analysis on the deep low-level vision model
Rethinking the Pipeline of Demosaicing, Denoising and Super-Resolution
Incomplete color sampling, noise degradation, and limited resolution are the
three key problems that are unavoidable in modern camera systems. Demosaicing
(DM), denoising (DN), and super-resolution (SR) are core components in a
digital image processing pipeline to overcome the three problems above,
respectively. Although each of these problems has been studied actively, the
mixture problem of DM, DN, and SR, which is a higher practical value, lacks
enough attention. Such a mixture problem is usually solved by a sequential
solution (applying each method independently in a fixed order: DM DN
SR), or is simply tackled by an end-to-end network without enough
analysis into interactions among tasks, resulting in an undesired performance
drop in the final image quality. In this paper, we rethink the mixture problem
from a holistic perspective and propose a new image processing pipeline: DN
SR DM. Extensive experiments show that simply modifying the usual
sequential solution by leveraging our proposed pipeline could enhance the image
quality by a large margin. We further adopt the proposed pipeline into an
end-to-end network, and present Trinity Enhancement Network (TENet).
Quantitative and qualitative experiments demonstrate the superiority of our
TENet to the state-of-the-art. Besides, we notice the literature lacks a full
color sampled dataset. To this end, we contribute a new high-quality full color
sampled real-world dataset, namely PixelShift200. Our experiments show the
benefit of the proposed PixelShift200 dataset for raw image processing.Comment: Code is available at: https://github.com/guochengqian/TENe
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining
Deep deraining networks, while successful in laboratory benchmarks,
consistently encounter substantial generalization issues when deployed in
real-world applications. A prevailing perspective in deep learning encourages
the use of highly complex training data, with the expectation that a richer
image content knowledge will facilitate overcoming the generalization problem.
However, through comprehensive and systematic experimentation, we discovered
that this strategy does not enhance the generalization capability of these
networks. On the contrary, it exacerbates the tendency of networks to overfit
to specific degradations. Our experiments reveal that better generalization in
a deraining network can be achieved by simplifying the complexity of the
training data. This is due to the networks are slacking off during training,
that is, learning the least complex elements in the image content and
degradation to minimize training loss. When the complexity of the background
image is less than that of the rain streaks, the network will prioritize the
reconstruction of the background, thereby avoiding overfitting to the rain
patterns and resulting in improved generalization performance. Our research not
only offers a valuable perspective and methodology for better understanding the
generalization problem in low-level vision tasks, but also displays promising
practical potential
Recursive Generalization Transformer for Image Super-Resolution
Transformer architectures have exhibited remarkable performance in image
super-resolution (SR). Since the quadratic computational complexity of the
self-attention (SA) in Transformer, existing methods tend to adopt SA in a
local region to reduce overheads. However, the local design restricts the
global context exploitation, which is crucial for accurate image
reconstruction. In this work, we propose the Recursive Generalization
Transformer (RGT) for image SR, which can capture global spatial information
and is suitable for high-resolution images. Specifically, we propose the
recursive-generalization self-attention (RG-SA). It recursively aggregates
input features into representative feature maps, and then utilizes
cross-attention to extract global information. Meanwhile, the channel
dimensions of attention matrices (query, key, and value) are further scaled to
mitigate the redundancy in the channel domain. Furthermore, we combine the
RG-SA with local self-attention to enhance the exploitation of the global
context, and propose the hybrid adaptive integration (HAI) for module
integration. The HAI allows the direct and effective fusion between features at
different levels (local or global). Extensive experiments demonstrate that our
RGT outperforms recent state-of-the-art methods quantitatively and
qualitatively. Code is released at https://github.com/zhengchen1999/RGT.Comment: Code is available at https://github.com/zhengchen1999/RG
- …